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DETAILED ACTION 

1 . Claims 1 -1 1 are pending. 



Drawings 

2. It is requested that more specific labels be applied to the boxes in Figures 1 and 
2, to better discern what is being represented without looking into the specification 



Specification 

3. Applicant is reminded of the proper language and format for an abstract of the 
disclosure. 

The abstract should be in narrative form and generally limited to a single 
paragraph on a separate sheet within the range of 50 to 1 50 words. It is important that 
the abstract not exceed 150 words in length since the space provided for the abstract 
on the computer tape used by the printer is limited. The form and legal phraseology 
often used in patent claims, such as "means" and "said," should be avoided. The 
abstract should describe the disclosure sufficiently to assist readers in deciding whether 
there is a need for consulting the full patent text for details. 

The language should be clear and concise and should not repeat information 
given in the title. It should avoid using phrases which can be implied, such as, "The 
disclosure concerns," "The disclosure defined by this invention," "The disclosure 
describes," etc. 

The abstract is intended to be a concise statement of the technical disclosure of 
the patent, while it appears in the instant application that it is a copy of Claim 1 , and not 
the invention as a whole. 



4. The title of the invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed. 



Application/Control Number: 10/643,586 Page 3 

Art Unit: 2183 

5. In page 1 of the specification, in the Related Applications field, the Patent 
Application numbers have not been filled out. It is requested that the Applicant fills in the 
blanks as appropriate. 

Claim Objections 

6. Claim 6 is objected to for improperly depending on Claim 3. As Claim 6 is 
identical to Claim 4, it is believed that it was intended to depend on Claim 5, and is 
assumed as such for the remainder of the office action. Appropriate correction is 
required. 

Claim Rejections - 35 USC § 102 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

7. Claim 1 is rejected under 35 U.S.C. 102(b) as being anticipated by Beard et al. 
(USPN 5,430,884, herein Beard). 

8. As per Claim 1 , Beard teaches: In a computer system having a scalar processing 
unit and a vector processing unit (Column 3, Lines 1-7), wherein the vector processing 
unit includes a vector dispatch unit (Column 3 Lines 27-40), a method of executing a 
vector memory instruction having a scalar operand, the method comprising: 

reading the scalar operand, wherein reading includes transferring the scalar 
operand from the scalar processing unit to the vector dispatch unit (Column 3, Lines 38- 
40 and Column 12, Lines 25-36); 
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determining if the vector memory instruction is scalar committed; and 
if the vector memory instruction is scalar committed, executing the vector 
memory operation as a function of the scalar operand (Column 12, Lines 26-40 disclose 
the scalar operands being put into a queue from the S registers. Generally in computing 
systems, the result of a scalar operation is not available outside the pipeline until it is 
committed, and there is no evidence in the specification to say otherwise, therefore the 
scalar operand in the queue would not be there until it is "scalar committed", which then 
allows the vector instruction to initiate, or begin execution). 

Claim Rejections - 35 USC § 103 

9. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

10. Claims 2-4, and 7 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Beard, in view of Patterson et al. (Herein Patterson). 

11. As per Claim 2, Beard teaches: The method according to claim 1 , wherein 
executing the vector memory operation includes translating an address associated with 
the vector memory operation (Column 31 , Lines 3-8), but fails to teach: 

and trapping on a translation fault. 
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While Beard does not explicitly teach trapping on a translation fault, Patterson 
teaches that a way to save the pipeline state safely in the event of an exception (also 
known as a fault) is to insert a trap in the pipeline (Page 183). A translation fault is an 
example of an exception, and thus would require a trap in order for the processor to 
regain correct operating status. One of ordinary skill in the art would recognize the use 
of a trap in order to regain control of the system, which is required in order for the 
computer system to continue operating properly. Therefore, one of ordinary skill in the 
art at the time the invention was made would have recognized the need to trap in the 
event of a translation fault. 

12. As per Claim 3, Beard teaches: In a computer system having a scalar processing 
unit and a vector processing unit, a method of decoupling vector data loads from vector 
instruction execution, comprising: 

generating an address for a vector load (Column 12, Lines 40-44 show how an 
address can be generated); 

issuing a vector load request to memory (Column 23, Lines 24-26); 

receiving vector data from memory (Column 23, Lines 24-26, when a load is 
requested from memory, it has to be received); 

executing a vector instruction on the vector data stored in the vector register 
(Column 3, Lines 27-40), but fails to teach: 

storing the vector data in a load buffer; 

transferring the vector data from the load buffer to a vector register. 
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While Beard teaches an instruction cache, which consists of multiple buffers 
(Column 7, Lines 52-55), he does not explicitly teach a data cache, which is commonly 
used in computer systems to vastly increase performance in loads and stores. 
Patterson teaches that most computers have 2 levels of caches now, both instruction 
and data caches, with Pages 380-381 showing an example data cache. As it is well 
known in the art, the cache would read the data directly from memory (or from a higher 
cache level), and the computer system would then read the data from the cache (buffer) 
into the appropriate register(s). One of ordinary skill in the art would have recognized 
the need to add a data cache into Beards system, for the same reason there is an 
instruction cache, increased performance and faster memory reads. Therefore, one of 
ordinary skill in the art at the time the invention was made would have added a data 
cache (which can function as a load buffer) into Beards invention to increase 
performance. 

1 3. As per Claim 4, Beard teaches: The method according to claim 3, wherein the 
vector processing unit includes a vector execute unit (Column 3, Lines 1-7, both the 
scalar and vector processor has functional units, which execute instructions) and a 
vector load/store unit (Patterson, Page B-5, the vector load-store unit), wherein issuing 
a vector load request to memory includes issuing and executing vector memory 
references in the vector load/store unit when the vector load store unit has received the 
instruction and memory operands from the scalar processing unit (Column 3, Lines 27- 
40). While Beard did not explicitly disclose a vector load/store unit, Patterson discloses 
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that it is an essential unit of a basic vector architecture to load and store to/from vectors. 
Therefore, a vector load/store unit would necessarily have to be present in Beards 
invention in order for it to function. 

14. As per Claim 7, Beard teaches: A computer system, comprising: 
a scalar processing unit (Column 3, Lines 1-7); and 
a vector processing unit (Column 3, Lines 1-7); 

wherein the vector load/store unit receives an instruction and memory operands 
from the scalar processing unit, issues and executes a vector memory load reference as 
a function of the instruction and the memory operands receives from the scalar 
processing unit (Column 3 Lines 27-40), but fails to teach: 

and stores data received as a result of the vector memory reference in a load 
buffer; and 

wherein the vector execute unit issues the vector memory load instruction and 
transfers the data received as a result of the vector memory reference from the load 
buffer to a vector register. 

While Beard teaches an instruction cache, which consists of multiple buffers 
(Column 7, Lines 52-55), he does not explicitly teach a data cache, which is commonly 
used in computer systems to vastly increase performance in loads and stores. 
Patterson teaches that most computers have 2 levels of caches now, both instruction 
and data caches, with Pages 380-381 showing an example data cache. As it is well 
known in the art, the cache would read the data directly from memory (or from a higher 
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cache level), and the computer system would then read the data from the cache (buffer) 
into the appropriate register(s). One of ordinary skill in the art would have recognized 
the need to add a data cache into Beards system, for the same reason there is an 
instruction cache, increased performance and faster memory reads. Therefore, one of 
ordinary skill in the art at the time the invention was made would have added a data 
cache (which can function as a load buffer) into Beards invention to increase 
performance. 

15. Claims 5-6, and 8-9 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Beard, in view of Gharachorloo et al (herein Gharachorloo). 

16. As per Claim 5, Beard teaches: In a computer system having a scalar processing 
unit and a vector processing unit, a method of decoupling vector data loads from vector 
instruction execution, comprising: 

generating a first and a second address for a vector load (Column 12, Lines 40- 
44 show an example of an address being generated in the scalar processing unit, and 
Column 3 Lines 1-3 is one of many examples showing Beards invention is capable of 
executing multiple instructions, and Column 12, Lines 26-28 show that a load can have 
multiple addresses); 

issuing first and second vector load requests to memory (Column 23, Lines 24- 

26); 
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receiving vector data associated with the first and second addresses from 
memory (Column 23, Lines 24-26, when a load is requested from memory, it has to be 
received); 

storing vector data associated with the first address in a first vector register 
(Column 23, Lines 24-33); 

storing vector data associated with the second address in a second vector 
register (Column 23, Lines 24-33); 

executing a vector instruction on the vector data stored in the first vector register 
(Column 3, Lines 27-40); 

executing a vector instruction on the vector data stored in the second vector 
register (Column 3, Lines 27-40), but fails to teach: 

renaming the second vector register. 

While Beard does not explicitly teach renaming, Gharachorloo teaches of a 
reservation station and a reorder buffer (Page 5, Section 4.2). A reservation station is 
well known in the art as a way for a processor to execute instructions out of order, and 
Beard does teach an out-of-order machine (Column 23, Lines 20-21). A reservation 
station acts as a queue or buffer to hold decoded instructions before execution. It allows 
the instruction decoding to be decoupled from instruction execution, allowing dynamic 
scheduling of instructions. The reorder buffer that comes with that eliminates storage 
conflicts through register renaming (Page 5, Section 4.2). The reorder buffer also allows 
for storage of speculative results, to potentially increase the workload of the processor. 
One of ordinary skill in the art would recognize the advantage of dynamic scheduling of 
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instructions as potentially bypassing unnecessary conflicts (along with the use of a 
reorder buffer) and increasing performance, and to also allow speculative execution. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
used a reservation station and a reorder buffer (and thus being able to rename 
registers) to implement the out-of-order execution in Beards invention to increase 
performance. 

17. As per Claim 6, Beard teaches: The method according to claim 5, wherein the 
vector processing unit includes a vector execute unit (Column 3, Lines 1-7, both the 
scalar and vector processor has functional units, which execute instructions) and a 
vector load/store unit (Patterson, Page B-5, the vector load-store unit), wherein issuing 
a vector load request to memory includes issuing and executing vector memory 
references in the vector load/store unit when the vector load store unit has received the 
instruction and memory operands from the scalar processing unit (Column 3, Lines 27- 
40). While Beard did not explicitly disclose a vector load/store unit, Patterson discloses 
that it is an essential unit of a basic vector architecture to load and store to/from vectors. 
Therefore, a vector load/store unit would necessarily have to be present in Beards 
invention in order for it to function. 

18. As per Claim 8, Beard teaches: In a computer system having a scalar processing 
unit and a vector processing unit, a method of decoupling scalar and vector execution, 
comprising: 
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dispatching a vector instruction that requires scalar operands to the scalar 
instruction queue and to a vector instruction queue (The scalar required part of the 
instruction goes into the scalar queue, and the vector queue is disclosed in Column 3, 
Lines 34-36); 

executing the vector instruction in the scalar processing unit, wherein executing 
the vector instruction in the scalar processing unit includes writing a scalar operand to a 
scalar operand queue (Column 3, Lines 38-40); 

notifying the vector processing unit that the scalar operand is available in the 
scalar operand queue (Column 11, Lines 44-53. It is notified by a bit in the scalar 
register scoreboard. When the vector processing unit looks into the scoreboard, it can 
determine if the scalar operand is available); and 

executing the vector instruction in the vector processing unit, wherein executing 
the vector instruction in the vector processing unit includes reading the scalar operand 
from the scalar operand queue (Column 12, Lines 26-36, the scalar operand data is 
transferred to the vector register unit as it begins execution), but fails to teach: 

dispatching scalar instructions to a scalar instruction queue. 

While Beard does not explicitly teach a queue to put scalar instructions in before 
execution, Gharachorloo teaches of a reservation station and a reorder buffer (Page 5, 
Section 4.2). A reservation station is well known in the art as a way for a processor to 
execute instructions out of order, and Beard does teach an out-of-order machine 
(Column 23, Lines 20-21). A reservation station acts as a queue or buffer to hold 
decoded instructions before execution. It allows the instruction decoding to be 



Application/Control Number: 10/643,586 Page 12 

Art Unit: 2183 

decoupled from instruction execution, allowing dynamic scheduling of instructions. One 
of ordinary skill in the art would recognize the advantage of dynamic scheduling of 
instructions as potentially bypassing unnecessary conflicts (along with the use of a 
reorder buffer) and increasing performance, and to also allow speculative execution. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
used a reservation station and a reorder buffer to implement the out-of-order execution 
in Beards invention to increase performance. 

1 9. As per Claim 9, Beard teaches: In a computer system having a scalar processing 
unit and a vector processing unit, a method of decoupling scalar and vector execution, 
comprising: 

dispatching a vector instruction that requires scalar operands to the scalar 
instruction queue and to a vector instruction queue (The scalar required part of the 
instruction goes into the scalar queue, and the vector queue is disclosed in Column 3, 
Lines 34-36); 

executing the vector instruction in the scalar processing unit, wherein executing 
the vector instruction in the scalar processing unit includes generating an address and 
writing the address to a scalar operand queue (Column 3, Lines 38-40. In addition, 
Column 12, Lines 40-44 show an example of an address being generated in the scalar 
processing unit); 

notifying the vector processing unit that the address is available in the scalar 
operand queue (Column 11, Lines 44-53. It is notified by a bit in the scalar register 
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scoreboard. When the vector processing unit looks into the scoreboard, it can determine 
if the scalar operand is available); and 

executing the vector instruction in the vector processing unit, wherein executing 
the vector instruction in the vector processing unit includes reading the address from the 
scalar operand queue and generating a memory request as a function of the address 
read from the scalar operand queue (Column 12, Lines 26-36, the scalar operand data 
is transferred to the vector register unit as it begins execution. In addition, Column 12, 
Lines 40-44 show an example of an address being generated in the scalar processing 
unit), but fails to teach: 

dispatching scalar instructions to a scalar instruction queue. 

While Beard does not explicitly teach a queue to put scalar instructions in before 
execution, Gharachorloo teaches of a reservation station and a reorder buffer (Page 5, 
Section 4.2). A reservation station is well known in the art as a way for a processor to 
execute instructions out of order, and Beard does teach an out-of-order machine 
(Column 23, Lines 20-21). A reservation station acts as a queue or buffer to hold 
decoded instructions before execution. It allows the instruction decoding to be 
decoupled from instruction execution, allowing dynamic scheduling of instructions. One 
of ordinary skill in the art would recognize the advantage of dynamic scheduling of 
instructions as potentially bypassing unnecessary conflicts (along with the use of a 
reorder buffer) and increasing performance, and to also allow speculative execution. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
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used a reservation station and a reorder buffer to implement the out-of-order execution 
in Beards invention to increase performance. 

20. Claims 10 and 11 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Beard and Gharachorloo, further in view of Patterson. 

21. As per Claim 10, Beard teaches: In a computer system having a scalar 
processing unit and a vector processing unit, a method of executing a vector instruction, 
comprising: 

dispatching a vector instruction to the scalar instruction queue and to vector 
instruction queue (The scalar required part of the instruction goes into the scalar queue, 
and the vector queue is disclosed in Column 3, Lines 34-36); 

executing the vector instruction in the scalar processing unit, wherein executing 
the vector instruction in the scalar processing unit includes generating an address and 
writing the address to a scalar operand queue (Column 3, Lines 38-40. In addition, 
Column 12, Lines 40-44 show an example of an address being generated in the scalar 
processing unit); 

notifying the vector processing unit that the address is available in the scalar 
operand queue (Column 1 1 , Lines 44-53. It is notified by a bit in the scalar register 
scoreboard. When the vector processing unit looks into the scoreboard, it can determine 
if the scalar operand is available); and 
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executing the vector instruction in the vector processing unit (Column 12, Lines 
26-36, the scalar operand data is transferred to the vector register unit as it begins 
execution), wherein executing the vector instruction in the vector processing unit 
includes: 

reading the address from the scalar operand queue (Column 12, Lines 33-36 
show the vector register getting the operand (in this case the address) as it begins); 

generating a memory request as a function of the address read from the scalar 
operand queue (Column 3 Lines 27-40, wherein all memory requests are a function of 
the address, and all addresses must necessarily come from the scalar unit, and thus the 
scalar operand queue for the vector to use it. Also see Column 23, Lines 24-26); 

receiving vector data from memory (Column 23, Lines 24-26, when a load is 
requested from memory, it has to be received); 

executing a vector instruction on the vector data storing in the vector register 
(Column 3, Lines 27-40), but fails to teach: 

dispatching scalar instructions to a scalar instruction queue; / 

storing the vector data in a load buffer; 

transferring the vector data from the load buffer to a vector register. 

While Beard does not explicitly teach a queue to put scalar instructions in before 
execution, Gharachorloo teaches of a reservation station and a reorder buffer (Page 5, 
Section 4.2). A reservation station is well known in the art as a way for a processor to 
execute instructions out of order, and Beard does teach an out-of-order machine 
(Column 23, Lines 20-21). A reservation station acts as a queue or buffer to hold 
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decoded instructions before execution. It allows the instruction decoding to be 
decoupled from instruction execution, allowing dynamic scheduling of instructions. One 
of ordinary skill in the art would recognize the advantage of dynamic scheduling of 
instructions as potentially bypassing unnecessary conflicts (along with the use of a 
reorder buffer) and increasing performance, and to also allow speculative execution. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
used a reservation station and a reorder buffer to implement the out-of-order execution 
in Beards invention to increase performance. 

Furthermore, while Beard teaches an instruction cache, which consists of 
multiple buffers (Column 7, Lines 52-55), he does not explicitly teach a data cache, 
which is commonly used in computer systems to vastly increase performance in loads 
and stores. Patterson teaches that most computers have 2 levels of caches now, both 
instruction and data caches, with Pages 380-381 showing an example data cache. As it 
is well known in the art, the cache would read the data directly from memory (or from a 
higher cache level), and the computer system would then read the data from the cache 
(buffer) into the appropriate register(s). One of ordinary skill in the art would have 
recognized the need to add a data cache into Beards system, for the same reason there 
is an instruction cache, increased performance and faster memory reads. Therefore, 
one of ordinary skill in the art at the time the invention was made would have added a 
data cache (which can function as a load buffer) into Beards invention to increase 
performance. 



Application/Control Number: 10/643,586 Page 17 

Art Unit: 2183 

22. As per Claim 1 1 , Beard teaches: In a computer system having a scalar 
processing unit and a vector processing unit, a method of unrolling a loop, comprising: 

preparing a first and a second vector instruction (Column 3 Lines 1-3 is one of 
many examples showing Beards invention is capable of executing multiple instructions), 

dispatching the first and second vector instructions to the scalar instruction 
queue and to a vector instruction queue (The scalar required part of the instruction does 
into the scalar queue, and the vector queue is disclosed in Column 3, Lines 34-36); 

executing each vector instruction in the scalar processing unit (Column 3, Lines 
38-40. In addition, Column 12, Lines 40-44 show an example of an address being 
generated in the scalar processing unit), 

notifying the vector processing unit that the scalar operand is available in the 
scalar operand queue (Column 1 1 , Lines 44-53. It is notified by a bit in the scalar 
register scoreboard. When the vector processing unit looks into the scoreboard, it can 
determine if the scalar operand is available); 

executing the first and second vector instructions in the vector processing unit, 
wherein executing the vector instruction in the vector processing unit includes reading 
the scalar operands associated with each instruction from the scalar operand queue 
(Column 12, Lines 26-36, the scalar operand data is transferred to the vector register 
unit as it begins execution, and is done only when each instruction needs it), but fails to 
teach: 

wherein each vector instruction execute an iteration through the loop and 
wherein each vector instruction requires calculation of a scalar loop value; 
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wherein executing each vector instruction in the scalar processing unit includes 
writing a scalar operand representing the scalar loop value calculated for each vector 
instruction to a scalar operand queue. 

While Beard does not explicitly teach a queue to put scalar instructions in before 
execution, Gharachorloo teaches of a reservation station and a reorder buffer (Page 5, 
Section 4.2). A reservation station is well known in the art as a way for a processor to 
execute instructions out of order, and Beard does teach an out-of-order machine 
(Column 23, Lines 20-21). A reservation station acts as a queue or buffer to hold 
decoded instructions before execution. It allows the instruction decoding to be 
decoupled from instruction execution, allowing dynamic scheduling of instructions. One 
of ordinary skill in the art would recognize the advantage of dynamic scheduling of 
instructions as potentially bypassing unnecessary conflicts (along with the use of a 
reorder buffer) and increasing performance, and to also allow speculative execution. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
used a reservation station and a reorder buffer to implement the out-of-order execution 
in Beards invention to increase performance. 

While Beard does not also explicitly teach a vector instruction, each of which 
each vector instruction executes an iteration of a loop, Patterson gives an example of 
how vector instructions can be used for this purpose (Pages B-7 and B-8). In this 
example, a MULTSV instruction is used for the first part, and an ADDV instruction for 
the second, although one of ordinary skill in the art could also see how this would be 
able to be modified to be done in one or multiple vector instructions, depending on the 
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required steps of the loop in question. It can also be seen that in this example, "a" is a 
scalar operand required for the vector instructions to execute. One of ordinary skill in 
the art would be able to see the value in using vector instructions to execute this loop, 
as shown by the vastly decreased code size as shown in the example on B-8. 
Therefore, one of ordinary skill in the art at the time the invention was made would have 
recognized the advantage of using vector instructions for the purpose of executing loops 
in Beards invention, to decrease code size, and therefore increase performance. 



Conclusion 

23. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure as follows. Applicant is reminded that in amending in response to 
a rejection of claims, the patentable novelty must be clearly shown in view of the state 
of the art disclosed by the references cited and the objections made. Applicant must 
also show how the amendments avoid such references and objections. See 37 CFR § 
1.111(c). 

24. DeLano et al. (USPN 5,787,494) teaches how a trap is used after a translation 
fault. 

25. Nagashima et al. (USPN 4,541 ,046) teaches a scalar processing unit and a 
vector processing unit, where the scalar processing unit feeds necessary scalar data to 
the vector processing unit. 
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